Probabilistic Exploration in Planning while Learning

نویسنده

  • Grigoris I. Karakoulas
چکیده

Sequential decision tasks with incomplete infor­ mation are characterized by the exploration prob­ lem; namely the trade-off between further exploration for learning more about the environ­ ment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learn­ ing and Q-learning in particular. The existing exploration strategies for Q-learning are of a heu­ ristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimenta­ tion should be sufficient for selecting with statis­ tical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hill-climbing algorithm that uses a statistical selection procedure to decide how much exploration is needed for selecting a plan which is, with arbitrarily high probability, arbi­ trarily close to a locally optimal one. Due to its generality the algorithm can be employed for the exploration strategy of robust Q-learning. An experiment on a relatively complex control task shows that the proposed exploration strategy per­ forms better than a typical exploration strategy. continuous flow of events in time. Effective decision-mak­ ing requires resolution of uncertainty as early as possible. The . te?dency to minimize losses resulting from wrong predictions of future events necessitates the division of the problem solution into steps. A decision at each step must make use of the information from the evolution of the events experienced thus far, but that evolution, in fact, depends on the type of decision made at each step.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Exploration in Planning while Learning

In decision-theoretic planning Pemberton and Korf (1994) have proposed separate heuristic functions for exploration and decision-making in incremental real-time search algorithms. Draper et al. (1994) have developed a probabilistic planning algorithm that performs both information-producing actions and contingent planning actions. Our exploration strategy could be applied to these planning task...

متن کامل

FF + FPG: Guiding a Policy-Gradient Planner

The Factored Policy-Gradient planner (FPG) (Buffet & Aberdeen 2006) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG’s weakness is potentially long learning time...

متن کامل

Exploration of Arak Medical Students’ Experiences on Effective Factors in Active Learning: A Qualitative Research

Introduction:: Medical students should use active learning to improve their daily duties and medical services. The goal of this study is exploring medical students’ experiences on effective factors in active learning. Methods: This qualitative study was conducted through content Analysis method in Arak University of Medical Sciences. Data were collected via interviews. The study started with p...

متن کامل

On-line Learning of Macro Planning Operators using Probabilistic Estimations of Cause-Effects

In this work we propose an on-line learning method for learning action rules for planning. The system uses a probabilistic approach of a constructive induction method that combines a beam search with an example-based search over candidate rules to find those that more concisely describe the world dynamics. The approach permits a rapid integration of the knowledge acquired from experience. Explo...

متن کامل

Action Schema Networks: Generalised Policies with Deep Learning

In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the net...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995